Computational Approaches to Arabic Script - based Languages

نویسندگان

  • Farhard Oroumchian
  • Ali Farghaly
چکیده

Discourse connectives can often signal multiple discourse relations, depending on their context. The automatic identification of the Arabic translations of seven English discourse connectives shows how these connectives are differently translated depending on their actual senses. Automatic labelling of English source connectives can help a machine translation system to translate them more correctly. The corpus-based analysis of Arabic translations also enables the definition of a connective-specific evaluation metric for machine translation, which is here validated by human judges on sample English/Arabic translation data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Script-Based Languages Deserve To Be Studied Linguistically

Arabic script-based languages are attracting increased attention for reasons that are regrettably far from their intrinsic linguistic interest. At the same time, statistical and corpus-based approaches to language processing are acquiring such dominance that it is becoming difficult for the advocates of other methods even to receive a hearing. I will argue that this is an alarming trend against...

متن کامل

Computer Processing Of Arabic Script-Based Languages. Current State And Future Directions

Arabic script-based languages do not belong to a single language family, and therefore exhibit different linguistic properties. To name just a few: Arabic is primarily a VSO language whereas Farsi is an SVO and Urdu is an SOV language. Both Farsi and Urdu have light verbs whereas Arabic does not. Urdu and Arabic have grammatical gender while Farsi does not. There are, however, linguistic and no...

متن کامل

Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR

Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using...

متن کامل

Analysis of Noori Nasta'leeq for major Pakistani languages

Nasta’leeq is a bidirectional, diagonal, non-monotonic, cursive, highly context-sensitive and very complex writing style for languages like Urdu, Punjabi, Balochi and Kashmiri. Each is written in a variant of the Perso-Arabic script. The style is characterized by well-formed orthographic rules that are passed down from generation to generation of calligraphers and old manuscripts. It is present...

متن کامل

Tokenizing an Arabic Script Language

In any natural language processing project, the input text needs to undergo tokenization before morphological analysis or parsing. For Arabic script languages the tokenization process faces more problems and it plays a more crucial role in natural language processing (NLP) systems for Arabic script languages. In this work we elaborate on some of these problems and present solutions for these. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012